Representing Videos based on Scene Layouts for Recognizing Agent-in-Place Actions

نویسندگان

Ruichi Yu

Hongcheng Wang

Ang Li

Jingxiao Zheng

Vlad I. Morariu

Larry S. Davis

چکیده

We address the recognition of agent-in-place actions, which are associated with agents who perform them and places where they occur, in the context of outdoor home surveillance. We introduce a representation of the geometry and topology of scene layouts so that a network can generalize from the layouts observed in the training set to unseen layouts in the test set. This Layout-Induced Video Representation (LIVR) abstracts away low-level appearance variance and encodes geometric and topological relationships of places in a specific scene layout. LIVR partitions the semantic features of a video clip into different places to force the network to learn place-based feature descriptions; to predict the confidence of each action, LIVR aggregates features from the place associated with an action and its adjacent places on the scene layout. We introduce the Agent-inPlace Action dataset to show that our method allows neural network models to generalize significantly better to unseen scenes.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Compressed Domain Scene Change Detection Based on Transform Units Distribution in High Efficiency Video Coding Standard

Scene change detection plays an important role in a number of video applications, including video indexing, searching, browsing, semantic features extraction, and, in general, pre-processing and post-processing operations. Several scene change detection methods have been proposed in different coding standards. Most of them use fixed thresholds for the similarity metrics to determine if there wa...

متن کامل

Representation and Visual Recognition of Complex , Multi - agent Actions using Belief

A probabilistic framework for representing and visually recognizing complex multi-agent action is presented. Motivated by work in model-based object recognition and designed for the recognition of action from visual evidence , the representation has three components: (1) temporal structure descriptions representing the logical and temporal relationships between agent goals, (2) belief networks ...

متن کامل

Representation and Visual Recognition of Complex, Multi-agent Actions using Belief Networks

A probabilistic framework for representing and visually recognizing complex multi-agent action is presented. Motivated by work in model-based object recognition and designed for the recognition of action from visual evidence, the representation has three components: (1) temporal structure descriptions representing the logical and temporal relationships between agent goals, (2) belief networks f...

متن کامل

Generating Videos with Scene Dynamics

We capitalize on large amounts of unlabeled video in order to learn a model of scene dynamics for both video recognition tasks (e.g. action classification) and video generation tasks (e.g. future prediction). We propose a generative adversarial network for video with a spatio-temporal convolutional architecture that untangles the scene’s foreground from the background. Experiments suggest this ...

متن کامل

Trajectory aligned features for first person action recognition

Egocentric videos are characterised by their ability to have the first person view. With the popularity of Google Glass and GoPro, use of egocentric videos is on the rise. Recognizing action of the wearer from egocentric videos is an important problem. Unstructured movement of the camera due to natural head motion of the wearer causes sharp changes in the visual field of the egocentric camera c...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2018

Representing Videos based on Scene Layouts for Recognizing Agent-in-Place Actions

نویسندگان

چکیده

منابع مشابه

Compressed Domain Scene Change Detection Based on Transform Units Distribution in High Efficiency Video Coding Standard

Representation and Visual Recognition of Complex , Multi - agent Actions using Belief

Representation and Visual Recognition of Complex, Multi-agent Actions using Belief Networks

Generating Videos with Scene Dynamics

Trajectory aligned features for first person action recognition

عنوان ژورنال:

اشتراک گذاری